Scenario User Story: As a city planner for Melbourne, I need to identify the top five hotspots for business establishments within the city so that I can better allocate resources, plan infrastructure, and support local economic development.
- Economic Growth: Target areas with high potential for business development.
- Data Utilization: Leverage business establishment data for informed decision-making.
At the end of this use case, we will be able to :
- To Understand how to process and analyze geospatial data
Learn how to use Python libraries like Pandas, Geopandas, and Matplotlib for data analysis and visualization Be able to identify key business hotspots using clustering techniques Gain experience in presenting data-driven insights for urban planning
In this use case, I focused on identifying the top five hotspots for business establishments in Melbourne. The rationale behind solving this problem is to help city planners and stakeholders make informed decisions about where to focus resources for infrastructure development, public services, and economic support. By analyzing the distribution of business establishments across the city, we can uncover patterns and trends that are crucial for effective urban planning.
The dataset used for this analysis includes business establishment records for Melbourne, which contains information such as location coordinates, business types, and establishment density. The data was sourced from the City of Melbourne's open data portal and further cleaned and processed to ensure accuracy and relevance for this analysis.
import requests
import pandas as pd
from io import StringIO
# Function to collect data
def collect_data(dataset_id):
base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
#apikey = "" # use if datasets require API key permissions
dataset_id = dataset_id
format = 'csv'
url = f'{base_url}{dataset_id}/exports/{format}'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
# 'api_key': apikey
}
# GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
dataset = pd.read_csv(StringIO(url_content), delimiter=';')
return dataset
else:
print(f'Request failed with status code {response.status_code}')
# Set dataset_id to query for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'
# Save dataset to df variable
df = collect_data(dataset_id)
# Check number of records in df
print(f'The dataset contains {len(df)} records.')
# View df
df.head(5)
The dataset contains 374210 records.
| census_year | block_id | property_id | base_property_id | clue_small_area | trading_name | business_address | industry_anzsic4_code | industry_anzsic4_description | longitude | latitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2003 | 105 | 100172 | 100172 | Melbourne (CBD) | Wilson Parking Australia | 24-46 A'Beckett Street MELBOURNE 3000 | 9533 | Parking Services | 144.962053 | -37.808573 |
| 1 | 2003 | 105 | 103301 | 103301 | Melbourne (CBD) | Melbourne International Backpackers | 442-450 Elizabeth Street MELBOURNE 3000 | 4400 | Accommodation | 144.960868 | -37.808309 |
| 2 | 2003 | 105 | 103302 | 103302 | Melbourne (CBD) | Vacant | 422-440 Elizabeth Street MELBOURNE 3000 | 0 | Vacant Space | 144.961017 | -37.808630 |
| 3 | 2003 | 105 | 103302 | 103302 | Melbourne (CBD) | The Garden Cafe | Shop 3, Ground , 422-440 Elizabeth Street MELB... | 4511 | Cafes and Restaurants | 144.961017 | -37.808630 |
| 4 | 2003 | 105 | 103302 | 103302 | Melbourne (CBD) | Telephony Australia | Shop 5, Ground , 422-440 Elizabeth Street MELB... | 5809 | Other Telecommunications Services | 144.961017 | -37.808630 |
Step 3: Check for Missing Values in Latitude and Longitude Columns
Below is a Python script that checks for missing values in the latitude and longitude columns of the dataset:# Check for missing values in latitude and longitude columns
missing_latitude = df['latitude'].isnull().sum()
missing_longitude = df['longitude'].isnull().sum()
print(f"Missing values in 'latitude': {missing_latitude}")
print(f"Missing values in 'longitude': {missing_longitude}")
Missing values in 'latitude': 4785 Missing values in 'longitude': 4785
Step 4: Clean the Dataset by Dropping Rows with Missing Values
# Drop rows with missing latitude or longitude values
df_cleaned = df.dropna(subset=['latitude', 'longitude'])
# Confirm that there are no missing values left
missing_latitude_cleaned = df_cleaned['latitude'].isnull().sum()
missing_longitude_cleaned = df_cleaned['longitude'].isnull().sum()
print(f"Missing values after cleaning in 'latitude': {missing_latitude_cleaned}")
print(f"Missing values after cleaning in 'longitude': {missing_longitude_cleaned}")
Missing values after cleaning in 'latitude': 0 Missing values after cleaning in 'longitude': 0
Step 5: Ensure Latitude and Longitude Columns are Numeric
# Ensure the latitude and longitude are numeric using .loc to avoid SettingWithCopyWarning
df_cleaned.loc[:, 'latitude'] = pd.to_numeric(df_cleaned['latitude'], errors='coerce')
df_cleaned.loc[:, 'longitude'] = pd.to_numeric(df_cleaned['longitude'], errors='coerce')
# Check the data types to confirm
print(df_cleaned[['latitude', 'longitude']].dtypes)
latitude float64 longitude float64 dtype: object
Step 6: Apply K-Means Clustering to Identify Business Hotspots
from sklearn.cluster import KMeans
# Extract the latitude and longitude columns for clustering
locations = df_cleaned[['latitude', 'longitude']]
# Apply K-Means clustering to identify 5 clusters (hotspots)
kmeans = KMeans(n_clusters=5, random_state=0)
df_cleaned['cluster'] = kmeans.fit_predict(locations)
# Count the number of businesses in each cluster
cluster_counts = df_cleaned['cluster'].value_counts().sort_values(ascending=False)
# Display the top 5 clusters
print(cluster_counts.head(5))
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
cluster 0 156255 4 124877 3 56994 1 19448 2 11851 Name: count, dtype: int64
<ipython-input-5-8a629c4bb00c>:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_cleaned['cluster'] = kmeans.fit_predict(locations)
Step 7: Retrieve Centroid Coordinates of Top 5 Hotspots
# Get the centroid coordinates of the top 5 clusters
top_clusters = cluster_counts.index[:5]
centroids = kmeans.cluster_centers_[top_clusters]
# Display the centroids of the top 5 hotspots
for i, centroid in enumerate(centroids):
print(f"Hotspot {i+1}: Latitude {centroid[0]}, Longitude {centroid[1]}")
Hotspot 1: Latitude -37.81649338709314, Longitude 144.9593746816832 Hotspot 2: Latitude -37.81027385193321, Longitude 144.96895152469776 Hotspot 3: Latitude -37.80272317246289, Longitude 144.9454911043396 Hotspot 4: Latitude -37.81482455911137, Longitude 144.91942891148435 Hotspot 5: Latitude -37.83819289766458, Longitude 144.97715040816652
Step 8: Visualize Hotspots on a Map Using Folium (Dynamic Centroid Extraction)
import folium
from sklearn.cluster import KMeans
# Assuming you have already obtained the cleaned dataframe 'df_cleaned'
# Extract the latitude and longitude columns for clustering
locations = df_cleaned[['latitude', 'longitude']]
# Apply K-Means clustering to identify 5 clusters (hotspots)
kmeans = KMeans(n_clusters=5, random_state=0)
df_cleaned['cluster'] = kmeans.fit_predict(locations)
# Extract the centroid coordinates from the K-Means clustering result
centroids = kmeans.cluster_centers_
# Create a base map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=13)
# Add markers for each hotspot centroid
for i, centroid in enumerate(centroids):
folium.Marker(
location=[centroid[0], centroid[1]], # The latitude and longitude of the centroid
popup=f"Hotspot {i+1}", # Popup label for the marker
icon=folium.Icon(color="red", icon="info-sign"), # Custom icon for the marker
).add_to(melbourne_map)
# Save the map to an HTML file
melbourne_map.save("melbourne_hotspots_map.html")
# If running in a Jupyter notebook, you can display the map directly
melbourne_map
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10) <ipython-input-8-0e453f57c009>:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_cleaned['cluster'] = kmeans.fit_predict(locations)
Objective: Analyze Existing Business Activity Based on Number of Businesses
The following analysis focuses on assessing the current business activity in Melbourne by evaluating the number of businesses operating in various sectors. This analysis will help identify which sectors have the highest concentration of businesses, providing insights into the city’s economic landscape.
Step 9: Data Inspection and Cleaning
# Inspect column names
print(df.columns)
# Check for missing values
df.isnull().sum()
# Fill missing values or drop rows/columns as necessary
df = df.dropna() # Example: Dropping rows with missing values
# Verify the data is clean
df.head()
Index(['census_year', 'block_id', 'property_id', 'base_property_id',
'clue_small_area', 'trading_name', 'business_address',
'industry_anzsic4_code', 'industry_anzsic4_description', 'longitude',
'latitude'],
dtype='object')
| census_year | block_id | property_id | base_property_id | clue_small_area | trading_name | business_address | industry_anzsic4_code | industry_anzsic4_description | longitude | latitude | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2003 | 105 | 100172 | 100172 | Melbourne (CBD) | Wilson Parking Australia | 24-46 A'Beckett Street MELBOURNE 3000 | 9533 | Parking Services | 144.962053 | -37.808573 |
| 1 | 2003 | 105 | 103301 | 103301 | Melbourne (CBD) | Melbourne International Backpackers | 442-450 Elizabeth Street MELBOURNE 3000 | 4400 | Accommodation | 144.960868 | -37.808309 |
| 2 | 2003 | 105 | 103302 | 103302 | Melbourne (CBD) | Vacant | 422-440 Elizabeth Street MELBOURNE 3000 | 0 | Vacant Space | 144.961017 | -37.808630 |
| 3 | 2003 | 105 | 103302 | 103302 | Melbourne (CBD) | The Garden Cafe | Shop 3, Ground , 422-440 Elizabeth Street MELB... | 4511 | Cafes and Restaurants | 144.961017 | -37.808630 |
| 4 | 2003 | 105 | 103302 | 103302 | Melbourne (CBD) | Telephony Australia | Shop 5, Ground , 422-440 Elizabeth Street MELB... | 5809 | Other Telecommunications Services | 144.961017 | -37.808630 |
Step 10: Analysis of Business Activity by Industry Classification
# Analysis: Number of businesses by industry classification
industry_column = 'industry_anzsic4_description'
industry_counts = df[industry_column].value_counts()
print(industry_counts)
industry_anzsic4_description
Vacant Space 56221
Cafes and Restaurants 29373
Legal Services 13371
Takeaway Food Services 11185
Computer System Design and Related Services 9456
...
Leather Tanning, Fur Dressing and Leather Product Manufacturing 2
Other Basic Polymer Manufacturing 2
Veterinary Pharmaceutical and Medicinal Product Manufacturing 2
Beef Cattle Farming (Specialised) 1
Other Basic Chemical Product Manufacturing n.e.c. 1
Name: count, Length: 441, dtype: int64
Step 13: Analyze Top 10 Business Types in Dynamic Hotspots
import pandas as pd
from sklearn.cluster import KMeans
# Assuming df_cleaned is your cleaned DataFrame with all necessary columns
# Extract the latitude and longitude columns for clustering
locations = df_cleaned[['latitude', 'longitude']]
# Apply K-Means clustering to identify 5 clusters (hotspots)
kmeans = KMeans(n_clusters=5, random_state=0)
df_cleaned['cluster'] = kmeans.fit_predict(locations)
# Extract the centroid coordinates from the K-Means clustering result
centroids = kmeans.cluster_centers_
# Define a small radius to filter businesses around each centroid (e.g., 0.01 degrees)
radius = 0.01
# Filter businesses within the top 5 hotspots
hotspot_businesses = pd.DataFrame()
for centroid in centroids:
lat, lon = centroid
filtered_df = df_cleaned[
(df_cleaned['latitude'].between(lat - radius, lat + radius)) &
(df_cleaned['longitude'].between(lon - radius, lon + radius))
]
hotspot_businesses = pd.concat([hotspot_businesses, filtered_df])
# Group by industry classification and count the number of businesses
industry_distribution = hotspot_businesses.groupby('industry_anzsic4_description').size()
# Sort by the number of businesses and get the top 10
top_10_industries = industry_distribution.sort_values(ascending=False).head(10)
# Display the top 10 business types
print("Top 10 Business Types for Establishment in Top 5 Hotspots:")
print(top_10_industries)
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:1416: FutureWarning: The default value of `n_init` will change from 10 to 'auto' in 1.4. Set the value of `n_init` explicitly to suppress the warning super()._check_params_vs_input(X, default_n_init=10)
Top 10 Business Types for Establishment in Top 5 Hotspots: industry_anzsic4_description Vacant Space 61647 Cafes and Restaurants 35938 Legal Services 20019 Takeaway Food Services 14828 Other Auxiliary Finance and Investment Services 13054 Computer System Design and Related Services 12332 Management Advice and Other Consulting Services 11873 Hairdressing and Beauty Services 8750 Clothing Retailing 8426 Womens Clothing Retailing 7991 dtype: int64
Step 14: Visualize Top 10 Business Types with a Horizontal Bar Chart
import seaborn as sns
import matplotlib.pyplot as plt
# Plot the horizontal bar chart with color coding
plt.figure(figsize=(10, 8))
sns.barplot(
x=top_10_industries.values,
y=top_10_industries.index,
palette='viridis'
)
plt.title('Top 10 Business Types for Establishment in Top 5 Hotspots')
plt.xlabel('Number of Businesses')
plt.ylabel('Industry Classification')
plt.show()
<ipython-input-13-85577e565a77>:6: FutureWarning: Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect. sns.barplot(
Step 15: Predicting Success Rate for New Businesses in Top 10 Industries within Top 5 Hotspots
This section describes how to develop a predictive model to estimate the success rate for starting a new business in one of the top 10 business activities within the top 5 hotspots identified in Melbourne.
1. Define Success Criteria
- Longevity-Based Success: A business could be considered successful if it has been operational for a certain number of years (e.g., 2 years or more).
- Revenue-Based Success: If revenue data is available, success can be defined by meeting a certain revenue threshold.
2. Feature Engineering
- Industry Classification: Include the business industry as a categorical feature.
- Location (Hotspot): Use the cluster or hotspot as a feature.
- Historical Success Data: Incorporate historical data about business longevity or success rates in specific industries and hotspots.
3. Model Selection
- Use a classification model such as Logistic Regression, Random Forest, or Gradient Boosting to predict whether a new business will be successful based on the above features.
- Features:
- Industry classification (one of the top 10 industries).
- Geographical location (hotspot cluster).
- Other relevant features (e.g., initial investment if available).
4. Training the Model
- Split the Data: Divide your data into training and testing sets.
- Model Training: Train your classification model using historical business data.
- Evaluation: Evaluate the model on the test set to check accuracy, precision, recall, and other relevant metrics.
5. Prediction
- Input: New business data (e.g., industry classification, hotspot location).
- Output: Probability of success (e.g., 80% chance of being successful).
Example Workflow
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
# Feature selection: industry classification and hotspot cluster
X = df_cleaned[['industry_anzsic4_description', 'cluster']]
X = pd.get_dummies(X, columns=['industry_anzsic4_description', 'cluster'])
# Target: success (0 or 1)
y = df_cleaned['success']
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)
# Predict and evaluate
y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
# Predict success probability for a new business
new_business = {'industry_anzsic4_description': 'Cafes and Restaurants', 'cluster': 2}
new_business_df = pd.DataFrame([new_business])
new_business_df = pd.get_dummies(new_business_df).reindex(columns=X.columns, fill_value=0)
success_probability = model.predict_proba(new_business_df)[0][1]
print(f"Predicted success probability: {success_probability:.2f}")
Accuracy: 0.9857481220816133
precision recall f1-score support
0 0.33 0.00 0.00 1052
1 0.99 1.00 0.99 72833
accuracy 0.99 73885
macro avg 0.66 0.50 0.50 73885
weighted avg 0.98 0.99 0.98 73885
Predicted success probability: 0.99
!pip install dash
!pip install dash-bootstrap-components
!pip install pyngrok
Requirement already satisfied: dash in /usr/local/lib/python3.10/dist-packages (2.17.1) Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash) (2.2.5) Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash) (3.0.4) Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.15.0) Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0) Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0) Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.0.0) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash) (8.4.0) Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash) (4.12.2) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash) (2.32.3) Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash) (1.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash) (1.6.0) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash) (71.0.4) Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (3.1.4) Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (2.2.0) Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (8.1.7) Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (9.0.0) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (24.1) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash) (2.1.5) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash) (3.20.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.8) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2024.7.4) Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash) (1.16.0) Collecting dash-bootstrap-components Downloading dash_bootstrap_components-1.6.0-py3-none-any.whl.metadata (5.2 kB) Requirement already satisfied: dash>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash-bootstrap-components) (2.17.1) Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.2.5) Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (3.0.4) Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.15.0) Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0) Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0) Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.0.0) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (8.4.0) Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (4.12.2) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.32.3) Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.6.0) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (71.0.4) Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (3.1.4) Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (2.2.0) Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (8.1.7) Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (9.0.0) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (24.1) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash>=2.0.0->dash-bootstrap-components) (2.1.5) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash>=2.0.0->dash-bootstrap-components) (3.20.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.8) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2024.7.4) Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash>=2.0.0->dash-bootstrap-components) (1.16.0) Downloading dash_bootstrap_components-1.6.0-py3-none-any.whl (222 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 222.5/222.5 kB 3.7 MB/s eta 0:00:00 Installing collected packages: dash-bootstrap-components Successfully installed dash-bootstrap-components-1.6.0 Collecting pyngrok Downloading pyngrok-7.2.0-py3-none-any.whl.metadata (7.4 kB) Requirement already satisfied: PyYAML>=5.1 in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.2) Downloading pyngrok-7.2.0-py3-none-any.whl (22 kB) Installing collected packages: pyngrok Successfully installed pyngrok-7.2.0
import requests
import pandas as pd
from io import StringIO
# Function to collect data
def collect_data(dataset_id):
base_url = 'https://data.melbourne.vic.gov.au/api/explore/v2.1/catalog/datasets/'
dataset_id = dataset_id
format = 'csv'
url = f'{base_url}{dataset_id}/exports/{format}'
params = {
'select': '*',
'limit': -1, # all records
'lang': 'en',
'timezone': 'UTC',
}
# GET request
response = requests.get(url, params=params)
if response.status_code == 200:
# StringIO to read the CSV data
url_content = response.content.decode('utf-8')
dataset = pd.read_csv(StringIO(url_content), delimiter=';')
return dataset
else:
print(f'Request failed with status code {response.status_code}')
return None
# Set dataset_id to query for the API call
dataset_id = 'business-establishments-with-address-and-industry-classification'
# Save dataset to df variable
df = collect_data(dataset_id)
if df is not None:
# Check number of records in df
print(f'The dataset contains {len(df)} records.')
# Save the DataFrame to a CSV file
df.to_csv('melbourne_business_data.csv', index=False)
print("Dataset saved as 'melbourne_business_data.csv'")
# Optionally view the first few rows of the dataset
print(df.head(5))
The dataset contains 374210 records.
Dataset saved as 'melbourne_business_data.csv'
census_year block_id property_id base_property_id clue_small_area \
0 2017 266 109851 109851 Carlton
1 2017 266 109851 109851 Carlton
2 2017 266 534003 534003 Carlton
3 2017 266 664003 664003 Carlton
4 2017 266 664005 664005 Carlton
trading_name business_address \
0 Metropoli's Research Pty Ltd Level 1, 74 Victoria Street CARLTON 3053
1 J Hong Restaurant Ground , 74 Victoria Street CARLTON 3053
2 St2 Expresso 70 Victoria Street CARLTON 3053
3 RMIT Resources Ltd 20 Cardigan Street CARLTON 3053
4 vacant 24 Cardigan Street CARLTON 3053
industry_anzsic4_code industry_anzsic4_description \
0 6950 Market Research and Statistical Services
1 4511 Cafes and Restaurants
2 4512 Takeaway Food Services
3 8102 Higher Education
4 0 Vacant Space
longitude latitude
0 144.965352 -37.806701
1 144.965352 -37.806701
2 144.965473 -37.806714
3 144.964753 -37.806312
4 144.964772 -37.806203
from google.colab import files
files.download('melbourne_business_data.csv')
!pip install dash
!pip install dash-bootstrap-components
!pip install pyngrok
Requirement already satisfied: dash in /usr/local/lib/python3.10/dist-packages (2.17.1) Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash) (2.2.5) Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash) (3.0.4) Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.15.0) Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0) Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (2.0.0) Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash) (5.0.0) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash) (8.4.0) Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash) (4.12.2) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash) (2.32.3) Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash) (1.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash) (1.6.0) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash) (71.0.4) Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (3.1.4) Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (2.2.0) Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash) (8.1.7) Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (9.0.0) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash) (24.1) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash) (2.1.5) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash) (3.20.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (3.8) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash) (2024.7.4) Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash) (1.16.0) Requirement already satisfied: dash-bootstrap-components in /usr/local/lib/python3.10/dist-packages (1.6.0) Requirement already satisfied: dash>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash-bootstrap-components) (2.17.1) Requirement already satisfied: Flask<3.1,>=1.0.4 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.2.5) Requirement already satisfied: Werkzeug<3.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (3.0.4) Requirement already satisfied: plotly>=5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.15.0) Requirement already satisfied: dash-html-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0) Requirement already satisfied: dash-core-components==2.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.0.0) Requirement already satisfied: dash-table==5.0.0 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (5.0.0) Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (8.4.0) Requirement already satisfied: typing-extensions>=4.1.1 in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (4.12.2) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (2.32.3) Requirement already satisfied: retrying in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (1.6.0) Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from dash>=2.0.0->dash-bootstrap-components) (71.0.4) Requirement already satisfied: Jinja2>=3.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (3.1.4) Requirement already satisfied: itsdangerous>=2.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (2.2.0) Requirement already satisfied: click>=8.0 in /usr/local/lib/python3.10/dist-packages (from Flask<3.1,>=1.0.4->dash>=2.0.0->dash-bootstrap-components) (8.1.7) Requirement already satisfied: tenacity>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (9.0.0) Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from plotly>=5.0.0->dash>=2.0.0->dash-bootstrap-components) (24.1) Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.10/dist-packages (from Werkzeug<3.1->dash>=2.0.0->dash-bootstrap-components) (2.1.5) Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.10/dist-packages (from importlib-metadata->dash>=2.0.0->dash-bootstrap-components) (3.20.1) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.3.2) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (3.8) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2.0.7) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->dash>=2.0.0->dash-bootstrap-components) (2024.7.4) Requirement already satisfied: six>=1.7.0 in /usr/local/lib/python3.10/dist-packages (from retrying->dash>=2.0.0->dash-bootstrap-components) (1.16.0) Requirement already satisfied: pyngrok in /usr/local/lib/python3.10/dist-packages (7.2.0) Requirement already satisfied: PyYAML>=5.1 in /usr/local/lib/python3.10/dist-packages (from pyngrok) (6.0.2)
from google.colab import files
uploaded = files.upload()
import pandas as pd
df_cleaned = pd.read_csv('/content/melbourne_business_data.csv')
print(df_cleaned.head())
Saving melbourne_business_data.csv to melbourne_business_data (1).csv
census_year block_id property_id base_property_id clue_small_area \
0 2017 266 109851 109851 Carlton
1 2017 266 109851 109851 Carlton
2 2017 266 534003 534003 Carlton
3 2017 266 664003 664003 Carlton
4 2017 266 664005 664005 Carlton
trading_name business_address \
0 Metropoli's Research Pty Ltd Level 1, 74 Victoria Street CARLTON 3053
1 J Hong Restaurant Ground , 74 Victoria Street CARLTON 3053
2 St2 Expresso 70 Victoria Street CARLTON 3053
3 RMIT Resources Ltd 20 Cardigan Street CARLTON 3053
4 vacant 24 Cardigan Street CARLTON 3053
industry_anzsic4_code industry_anzsic4_description \
0 6950 Market Research and Statistical Services
1 4511 Cafes and Restaurants
2 4512 Takeaway Food Services
3 8102 Higher Education
4 0 Vacant Space
longitude latitude
0 144.965352 -37.806701
1 144.965352 -37.806701
2 144.965473 -37.806714
3 144.964753 -37.806312
4 144.964772 -37.806203
import folium
import pandas as pd
from folium.plugins import HeatMap
# Load your cleaned data
df_cleaned = pd.read_csv('/content/melbourne_business_data.csv')
# Filter the data to include only rows with valid latitude and longitude values
df_cleaned = df_cleaned.dropna(subset=['latitude', 'longitude'])
# Aggregate data by latitude and longitude for the heatmap
heatmap_data = df_cleaned[['latitude', 'longitude']].values.tolist()
# Create a base map centered around Melbourne
melbourne_map = folium.Map(location=[-37.8136, 144.9631], zoom_start=12)
# Add the HeatMap layer to the map
HeatMap(heatmap_data, radius=10).add_to(melbourne_map)
# Add markers for the prominent hotspots
hotspots = [
{"name": "Hotspot 1", "lat": -37.81649338709314, "lon": 144.9593746816832},
{"name": "Hotspot 2", "lat": -37.81027385193321, "lon": 144.96895152469776},
{"name": "Hotspot 3", "lat": -37.80272317246289, "lon": 144.9454911043396},
{"name": "Hotspot 4", "lat": -37.81482455911137, "lon": 144.91942891148435},
{"name": "Hotspot 5", "lat": -37.83819289766458, "lon": 144.97715040816652},
]
# Add a marker for each hotspot
for hotspot in hotspots:
folium.Marker(
location=[hotspot["lat"], hotspot["lon"]],
popup=hotspot["name"],
icon=folium.Icon(color='red', icon='info-sign')
).add_to(melbourne_map)
# Display the map
melbourne_map